<HTML>
<HEAD>
   <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
   <META NAME="GENERATOR" CONTENT="Mozilla/4.04 [en] (X11; I; IRIX 6.3 IP32) [Netscape]">
   <TITLE>Completeness of distance restraints; methods</TITLE>
<LINK HREF="mailto:jurgenfd@gmail.com" REV="MADE">
</HEAD>
<BODY BGCOLOR="#FFFFFF">

<H1>
Aqua program module</H1>

<H2>
Completeness of distance restraints</H2>

<H3>
Methods</H3>

<HR>
<H3>
Intro</H3>
We have implemented a check on the completeness of NOE restraints in the
AQUA program. The calculated completeness can be useful in the initial
phases of structure determination using NMR by focusing on NOE contacts
in specific regions in a protein or pinpointing problems to specific residues,
atoms or classes of NOE contacts. The completeness check and its application
to a large set of structures is described in:
<UL>
<LI>
J.F. Doreleijers, M.L. Raves, J.A.C. Rullmann & R. Kaptein.
"Completeness of NOEs in proteins: a statistical analysis of NMR data"
<I>J. Biomol. NMR</I> (1999) <B>14</B>, 123-132.
</LI>
</UL>

<HR>
<H3>
Basics</H3>
Based on one or more models of a protein structure, a set of contacts expected
to be observable in a NOESY type NMR experiment is generated. The intersection
of the set of observable model contacts (set B) and the set of NOEs (set
A) contains the matched contacts. Completeness is defined as the ratio
between the number of the matched contacts and the number of observable
model contacts (set B). The values of the upper-bounds (and lower-bounds)
are not considered in this analysis.

<H3>
Cut-off</H3>
In a NOESY type NMR experiment contacts between protons separated up to
~ 5.5 &Aring; can be observed. The maximum distance between protons, up
to which model contacts are calculated, is an important determinant for
the level of completeness and can be supplied to the program as a parameter.
For a low value, like 3.0 &Aring;, nearly all observable model contacts
will have been observed in the NMR experiment. The number of observable
model contacts increases approximately quadratically with this value.

<H3>
<A NAME="Observable atoms"></A>Observable atoms</H3>

Which protons are observable in an NMR experiment? The standard definition
includes all non-exchangeable protons and stereospecifically assignable
protons beyond the beta-protons as pseudoatoms. The stereospecifically
assignable methyl-groups of Valine and Leucine is an exception to this
general rule. Other <A HREF="observables.html">definitions</A> of what
is observable can be supplied to the program.

<H3>
<A NAME="Averaging over models"></A>Averaging over models</H3>
Usually a normal average is used instead of an r<SUP>-6</SUP> average 
because in practice the latter gives too much weight to the shorter
distances.

<H3>
<A NAME="Stereospecifically assignable protons"></A>Stereospecifically
assignable protons</H3>

The treatment of stereospecifically assignable protons is quite important
for the outcome of the analysis. If pseudoatoms are used instead of stereospecifically
assignable protons, then the completeness of contacts between such atoms
can never be a 100 %. The list of observable atoms should not contain stereospecific
protons <B>and</B> the pseudoatoms that represent them. Restraints with
a pseudoatom will be deleted if the same restraint is present with a stereospecific
atom. This will be signalled in the log file. For Phenylalanine, Tyrosine
(QD and QE) and Arginine (QH1,QH2) one cannot calculate the completeness
of stereospecific protons (<I>e.g.</I> Phe HD1), due to technical limitation
of the program. These protons are therefore not even included in the set
of theoretically observable protons. This is due to the fact that the pseudoatoms
themselves can also be expanded to QR (Phe and Tyr) and QH (Arg) respectively.
<BR>

Two types of cases of stereospecificity:
<OL>
<LI>
A restraint contains a stereospecific atom (<I>e.g.</I> HD1) and the list
of observable atoms contains the representing pseudo atom (<I>e.g.</I>
QD). The restraint will be adjusted (and pseudoatom correction will be
added to the upper bound) such that the stereospecific atom is mapped to
the pseudoatom (QD). Subsequently the list of restraints is cleaned up
to remove doubly occurring restraints that could arise from this mapping.
<I>E.g.</I> restraints to X-HD1 and X-HD2 would collapse to a single restraint
to X-QD.</LI>

<LI>
A restraint contains a pseudo atom (<I>e.g.</I> QB) and the list of observable
atoms contains the individual stereospecific atoms (<I>e.g.</I> HB2 and
HB3). The pseudoatom of the restraint will then be matched to ONE of the
stereospecifically assignable atoms. In the listing per shell (table: COMPLETENESS_PER_SHELL)
it's <I>(a)</I> the atom in the smaller shell (<I>e.g.</I> HB3) or <I>(b)</I>
the first atom in the molecule (<I>e.g.</I> HB2). This prevents a "not
existing violation" from being listed as a violation. In the listing per
atom type (table: COMPLETENESS_PER_ATOM) it is the atom first in the molecule.
In general a maximum of 50 % can thus be obtained for completeness for
this atom type because of the lack of stereospecific assignments. Pseudoatom
correction will not be subtracted nor added.</LI>
</OL>

<H3>
Completeness and the number of NOEs</H3>
The number of NOEs per residue is often plotted as an indicator of how
well the contacts have been gathered. If the residue type is taken into
account (which is not always the case) this number is informative and is
also provided by this program. A plot of this number versus the sequence,
however, is very scattered. The proposed completeness number is approximately
the same for each residue type. Residues in a sequence that have a low
completeness are easily observed.
<H3>
Flexibility</H3>
The distance between atoms is averaged over all provided models of the
structure. If this averaged distance is below the maximum distance discussed
above, the contact is considered as an "observable model contact". In case
of conformational variability, the averaged distance will often be above
the threshold. As a consequence, the distance will be discarded from the
set of observable model contacts. This way a flexible region of a protein
can have a similar completeness as a rigid region although the latter is
defined by more NOEs per residue.
<H3>
Reference values for completeness</H3>
The average completeness for maximum distances of 4.0 and 5.0 &Aring; for
97 studied NMR protein structures was 48 +/- 13 and 26 +/- 9 %, respectively.
The best structures had a completeness of 76 (4.0 &Aring;) and 48 % (5.0
&Aring;). Intra-residual contacts were excluded from the analysis. The
set of observable atoms was the standard set (see section: <A HREF="#Observable atoms">Observable
atoms</A>).
<H3>
The program's actions step by step</H3>
The <I>first</I> step is to convert the set of experimentally observed
NOEs to a set A according to the definition of what is actually expected
to be observable. For example, a contact with a labile proton like Serine
HG might have been observed but cannot always be expected to be observed.
If it is not present in the definition then the restraints to it will not
be considered in the analysis. The treatment of contacts with stereospecific
protons is detailed <A HREF="#Stereospecifically assignable protons">here</A>.

<P>The <I>second</I> step creates a set of observable theoretical contacts
(set B). The average distance is calculated
between all atoms which are observable according to the same definition.
Only contacts that have an average distance below the maximum distance
that is specified by the user are retained for set B. When pseudoatoms
are "observable" their position is used instead of calculating the contribution
of each of the constituting atoms. Contacts outside the range selection
(if present) are discarded at this point as well.

<P>In the <I>third</I> step, the intra-residual contacts of both lists
are filtered for redundancy (see 'qhelp <A HREF="../doc/redunchk.txt">redunchk</A>').
Since the intra-residual contacts of set B cannot contain redundant restraints
in the normal sense (<I>e.g.</I> upper-bound above the geometrically possible
distance), the contacts of both sets (A and B) will only be checked for
having fixed distances (<I>e.g.</I> Alanine HA-MB). If an option is set
to discard the intra-residual contacts then these contacts will have been
deleted in the previous two steps and this step will be skipped.

<P>The <I>fourth</I> step will match the contacts of both sets A and B
to each other. In the data block COMPLETENESS_PER_SHELL, and in the <A HREF="compl_output.html">log
file</A> this process can be followed as it loops over a number of equidistant
shells. This way the analysis does not have to be repeated to get the completeness
at different values for the maximum distance. The number of shells and
the minimum and maximum distance for both sets can be specified, as described
<A HREF="complchk.html">here</A>.

<P>The <I>fifth</I> step decomposes the completeness with respect to: class
of contact (intra, sequential etc.), atom type, residue type, residue (perhaps
most useful), atom in a certain residue type, and atom. All these analyses
will produce a separate data block in the output file.

<P>The last step (<I>six</I>, if you will) produces statistics on the completeness
of the stereospecific assignments that are implicitly found in set A. Results
can be found in the SSA_PER_... datablocks of the output file.

<P><B>NOTES</B>
<UL>
<LI>
The script '<A HREF="complchk.html">complchk</A>' can be used to run AquaCompl,
including the necessary preparatory steps.</LI>

<LI>
Alternatively, the completeness check can be performed by the AQUA <A HREF="../server/.">server</A>.</LI>

<LI>
More documentation is available on the <A HREF="compl_output.html">output</A>
of the program module AquaCompl.</LI>
</UL>

<HR>

<P>Contact the <A HREF="mailto:jurgenfd@gmail.com">author</A> for help,
when required.
</BODY>
</HTML>
